1,209 research outputs found

    A Standardised Benchmark for Assessing the Performance of Fixed Radius Near Neighbours

    Get PDF
    Many agent based models require agents to have an awareness of their local peers. The handling of these fixed radius near neighbours (FRNNs) is often a limiting factor of performance. However without a standardised metric to assess the handling of FRNNs, contributions to the field lack the rigorous appraisal necessary to expose their relative benefits. This paper presents a standardised specification of a multi agent based benchmark model. The benchmark model provides a means for the objective assessment of FRNNs performance, through the comparison of implementations. Results collected from implementations of the benchmark model under three agent based modelling frameworks show the 64-bit floating point performance of each framework to scale linearly with agent population, in contrast the GPU accelerated framework’s 32- bit floating point performance only became linear after maximal device utilisation around 100,000 agent

    3D visual speech animation using 2D videos

    Get PDF
    In visual speech animation, lip motion accuracy is of paramount importance for speech intelligibility, especially for the hard of hearing or foreign language learners. We present an approach for visual speech animation that uses tracked lip motion in front-view 2D videos of a real speaker to drive the lip motion of a synthetic 3D head. This makes use of a 3D morphable model (3DMM), built using 3D synthetic head poses, with corresponding landmarks identified in the 2D videos and the 3DMM. We show that using a wider range of synthetic head poses for different phoneme intensities to create a 3DMM, as well as a combination of front and side photographs of the real speakers rather than just front photographs to produce initial neutral 3D synthetic head poses, gives better animation results when compared to ground truth data consisting of front-view 2D videos of real speakers

    Towards insect inspired visual sensors for robots

    Get PDF
    Flying insects display a repertoire of complex behaviours that are facilitated by their non-standard visual system that if understood would offer solutions for weight- and power- constrained robotic platforms such as micro unmanned aerial vehicles (MUAVs). Crucial to this goal is revealing the specific features of insect eyes that engineered solutions would benefit from possessing, however progress in exploration of the design space has been limited by challenges in accurately replicating insect vision. Here we propose that emerging ray-tracing technologies are ideally placed to realise the high-fidelity replication of the insect visual perspective in a rapid, modular and adaptive framework allowing development of technical specifications for a new class of bio-inspired sensor. A proof-of-principle insect eye renderer is shown and insights into research directions it affords discussed

    Two-dimensional batch linear programming on the GPU

    Get PDF
    This paper presents a novel, high-performance, graphical processing unit-based algorithm for efficiently solving two-dimensional linear programs in batches. The domain of two-dimensional linear programs is particularly useful due to the prevalence of relevant geometric problems. Batch linear programming refers to solving numerous different linear programs within one operation. By solving many linear programs simultaneously and distributing workload evenly across threads, graphical processing unit utilization can be maximized. Speedups of over 22 times and 63 times are obtained against state-of-the-art graphics processing unit and CPU linear program solvers, respectively

    The impact of the Lombard effect on audio and visual speech recognition systems

    Get PDF
    When producing speech in noisy backgrounds talkers reflexively adapt their speaking style in ways that increase speech-in-noise intelligibility. This adaptation, known as the Lombard effect, is likely to have an adverse effect on the performance of automatic speech recognition systems that have not been designed to anticipate it. However, previous studies of this impact have used very small amounts of data and recognition systems that lack modern adaptation strategies. This paper aims to rectify this by using a new audio-visual Lombard corpus containing speech from 54 different speakers – significantly larger than any previously available – and modern state-of-the-art speech recognition techniques. The paper is organised as three speech-in-noise recognition studies. The first examines the case in which a system is presented with Lombard speech having been exclusively trained on normal speech. It was found that the Lombard mismatch caused a significant decrease in performance even if the level of the Lombard speech was normalised to match the level of normal speech. However, the size of the mismatch was highly speaker-dependent thus explaining conflicting results presented in previous smaller studies. The second study compares systems trained in matched conditions (i.e., training and testing with the same speaking style). Here the Lombard speech affords a large increase in recognition performance. Part of this is due to the greater energy leading to a reduction in noise masking, but performance improvements persist even after the effect of signal-to-noise level difference is compensated. An analysis across speakers shows that the Lombard speech energy is spectro-temporally distributed in a way that reduces energetic masking, and this reduction in masking is associated with an increase in recognition performance. The final study repeats the first two using a recognition system training on visual speech. In the visual domain, performance differences are not confounded by differences in noise masking. It was found that in matched-conditions Lombard speech supports better recognition performance than normal speech. The benefit was consistently present across all speakers but to a varying degree. Surprisingly, the Lombard benefit was observed to a small degree even when training on mismatched non-Lombard visual speech, i.e., the increased clarity of the Lombard speech outweighed the impact of the mismatch. The paper presents two generally applicable conclusions: i) systems that are designed to operate in noise will benefit from being trained on well-matched Lombard speech data, ii) the results of speech recognition evaluations that employ artificial speech and noise mixing need to be treated with caution: they are overly-optimistic to the extent that they ignore a significant source of mismatch but at the same time overly-pessimistic in that they do not anticipate the potential increased intelligibility of the Lombard speaking style

    Augmented reality safety zone configurations in human-robot collaboration: a user study

    Get PDF
    Close interaction with robots in Human-Robot Collaboration (HRC) can increase worker productivity in production, but cages around the robot often limit this. Our research aims to visualise virtual safety zones around a real robot arm with Augmented Reality (AR), thereby replacing the cages. We tested our system with a collaborative pick-and-place application which mimics a real manufacturing scenario in an industrial robot cell. The shape, size and visualisation of the AR safety zones were tested with 19 participants. The overwhelming preference was for a visualisation that used cylindrical AR safety zones together with a virtual cage bars effect

    Depth-aware neural style transfer for videos

    Get PDF
    Temporal consistency and content preservation are the prominent challenges in artistic video style transfer. To address these challenges, we present a technique that utilizes depth data and we demonstrate this on real-world videos from the web, as well as on a standard video dataset of three-dimensional computer-generated content. Our algorithm employs an image-transformation network combined with a depth encoder network for stylizing video sequences. For improved global structure preservation and temporal stability, the depth encoder network encodes ground-truth depth information which is fused into the stylization network. To further enforce temporal coherence, we employ ConvLSTM layers in the encoder, and a loss function based on calculated depth information for the output frames is also used. We show that our approach is capable of producing stylized videos with improved temporal consistency compared to state-of-the-art methods whilst also successfully transferring the artistic style of a target painting
    • …
    corecore